Voxtral Realtime: enable bf16 for Metal backend with quantization by mergennachin · Pull Request #17845 · pytorch/executorch

mergennachin · 2026-03-04T14:28:58Z

The Metal AOTI backend already handles bf16 correctly (fp32 attention
masks, fp32 RoPE upcast, dtype-agnostic KV caches and SDPA). Enable
--dtype bf16 as the default recipe for Metal CI and update all
documentation to recommend bf16 with fpa4w quantization.

pytorch-bot · 2026-03-04T14:29:02Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17845

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Cancelled Job, 4 Unrelated Failures

As of commit 52027ff with merge base 6db7f4c ():

NEW FAILURES - The following jobs have failed:

Test Metal Backend / test-model-metal-e2e (mistralai, Voxtral-Mini-4B-Realtime-2602, quantized-int4-metal) / macos-job (gh)
program_source:4450:24: error: assigning to '\''bfloat'\'' from inco
trunk / test-arm-backend-vkml (test_pytest_ops_vkml) / linux-job (gh)
RuntimeError: Command docker exec -t 849b2fd1e74edbd359aefe0803fac60bb57daba2342e2aaa59495a0c67d361a4 /exec failed with exit code 1

CANCELLED JOB - The following job was cancelled. Please retry:

trunk / test-models-macos-mps / macos-job (gh)
##[error]The operation was canceled.

FLAKY - The following job failed but was likely due to flakiness present on trunk:

trunk / unittest-release / macos / macos-job (gh) (similar failure)
AttributeError: '_OpNamespace' 'mkldnn' object has no attribute '_is_mkldnn_acl_supported'

BROKEN TRUNK - The following jobs failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / macos / macos-job (gh) (trunk failure)
AttributeError: '_OpNamespace' 'mkldnn' object has no attribute '_is_mkldnn_acl_supported'
pull / unittest-editable / macos / macos-job (gh) (trunk failure)
AttributeError: '_OpNamespace' 'mkldnn' object has no attribute '_is_mkldnn_acl_supported'
trunk / test-mcu-cortex-m-backend / linux-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-03-04T14:29:43Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copilot

Pull request overview

Enables and recommends bf16 for Voxtral Realtime exports on Metal when using quantization, updating CI export arguments and user-facing docs to reflect the preferred configuration for memory/throughput.

Changes:

Update Voxtral Realtime docs to include bf16 memory footprint numbers and recommend --dtype bf16 for Metal quantized exports.
Adjust example Metal export command(s) to include --dtype bf16 alongside fpa4w.
Update Metal CI export script to pass --dtype bf16 for the quantized-int4-metal configuration.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
examples/models/voxtral_realtime/model.md	Updates memory calculations and guidance around bf16 + quantization for Metal/CUDA.
examples/models/voxtral_realtime/export_voxtral_rt.py	Updates usage example to show Metal export with bf16 + `fpa4w`.
examples/models/voxtral_realtime/README.md	Updates Metal backend table and export examples to recommend bf16 with `fpa4w`.
.ci/scripts/export_model_artifact.sh	Ensures Metal int4 quantized CI export passes `--dtype bf16`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

examples/models/voxtral_realtime/model.md

The Metal AOTI backend already handles bf16 correctly (fp32 attention masks, fp32 RoPE upcast, dtype-agnostic KV caches and SDPA). Enable --dtype bf16 as the default recipe for Metal CI and update all documentation to recommend bf16 with fpa4w quantization. Fix a Metal shader compilation bug in the streaming encoder where bool.to(bf16) generates `bfloat tmp = 0.0;` — Metal Shading Language doesn't support implicit float-to-bfloat literal conversion. Use .float() instead and let mul_ handle type promotion.

mergennachin requested a review from lucylq as a code owner March 4, 2026 14:28

Copilot AI review requested due to automatic review settings March 4, 2026 14:29

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 4, 2026

mergennachin requested a review from manuelcandales March 4, 2026 14:29

Copilot started reviewing on behalf of mergennachin March 4, 2026 14:29 View session

Copilot AI reviewed Mar 4, 2026

View reviewed changes

examples/models/voxtral_realtime/model.md Show resolved Hide resolved

examples/models/voxtral_realtime/model.md Show resolved Hide resolved

examples/models/voxtral_realtime/model.md Show resolved Hide resolved

mergennachin marked this pull request as draft March 4, 2026 14:37

mergennachin force-pushed the bf16_voxtral_metal branch from 40b6144 to 52027ff Compare March 4, 2026 14:45

mergennachin temporarily deployed to upload-benchmark-results March 4, 2026 15:40 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Voxtral Realtime: enable bf16 for Metal backend with quantization#17845

Voxtral Realtime: enable bf16 for Metal backend with quantization#17845
mergennachin wants to merge 1 commit intomainfrom
bf16_voxtral_metal

mergennachin commented Mar 4, 2026

Uh oh!

pytorch-bot bot commented Mar 4, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mergennachin commented Mar 4, 2026

Uh oh!

pytorch-bot bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17845

❌ 2 New Failures, 1 Cancelled Job, 4 Unrelated Failures

Uh oh!

github-actions bot commented Mar 4, 2026

This PR needs a release notes: label

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot bot commented Mar 4, 2026 •

edited

Loading

This PR needs a `release notes:` label